\chapter{Other portable modules}

\section{Concrete syntax tree}
\label{sec-concrete-syntax-tree}

The Concrete Syntax Tree%
\footnote{https://github.com/s-expressionists/Concrete-Syntax-Tree}
module was designed to solve the problem of \emph{source tracking},
i.e., the association of some source code with some concept of
\emph{location} of that code in a source file.

This module handles the problem by wrapping S-expressions in standard
objects that can contain this additional information.  The standard
object in question mimics two kinds of S-expressions, namely
\texttt{cons} cells and \emph{atoms}.  Operations are provided by this
module to take the \texttt{first} and the \texttt{rest} of a standard
object that representes a \texttt{cons} cell, and to return the
\emph{raw} contents of a concrete syntax tree, i.e., the underlying
S-expression.

Furthermore, this module provides a certain number of higher-level
utilities for manipulating concrete syntax trees.  For example, it
contains a lambda-list parser for lambda lists that are represented as
concrete syntax trees.

A particularly important utility is one that \emph{reconstructs} a
concrete syntax tree from two arguments, namely a concrete syntax tree
and an S-expression.  This utility can be used in a compiler for macro
expansion.  \commonlisp{} macro expansion is specified to work on
S-expressions.  The problem, then, is to preserve source information
across macro expansion.  A compiler would then take a concrete syntax
tree, retrieve the raw underlying S-expression and pass that
S-expression to the macro function.  The macro function returns
another S-expression that typically contains some nodes of the
original S-expression and some nodes that were introduced by the macro
function.  The reconstruction utility tries to match nodes in the
output to nodes in the input, and construct a concrete syntax tree
from the result.  For \texttt{cons} cells, the task is fairly simple.
If the \emph{same} (as determined by \texttt{eq}) \texttt{cons} cell
exists in the input and in the output, the concrete syntax tree for
that \texttt{cons} cell in the input is reused in the reconstructed
concrete syntax tree.  For atoms, the problem is harder.  The concrete
syntax tree module uses some heuristics to provide a plausible
reconstruction.

The concrete syntax tree module also contains a parser for lambda
lists, represented as concrete syntax trees.  This parser uses the
Earley parsing technique, which we believe is a good fit for parsing
S-expressions.  However, the way the parser is expressed is currently
not satisfactory.  We are looking for alternative ways of expressing
parsers on S-expressions.

\section{First-class global environments}
\label{sec-first-class-global-environments}

Most \commonlisp{} implementations do not have an explicit
(first-class) representation for the global environments, and most
\commonlisp{} implementations have a single global environment.  This
environment is \emph{spread out} all over the system.  The
\texttt{symbol} objects of such an implementation may contain a slot
for a function named by that symbol and perhaps a slot for the global
value of a variable named by that symbol.  The mapping from class
names to class objects can also be represented as a slot in the symbol
object, but is more likely represented separately in a hash table.
Similarly, method combinations, \texttt{setf} expanders, and other
mappings from names to objects may be represented in yet another way.

\sysname{} uses \emph{first-class global environments} for all these
mappings.  There are several reasons for this way of representing
environments:

\begin{itemize}
\item It allows for multiple global environments, which in turn
  provides solutions to a number of common problems.  For example,
  several different versions of a system can be loaded simultaneously,
  provided that each version is loaded into a separate environment.
\item It becomes possible (and easy) to distinguish the three types of
  global environments that the \commonlisp{} standard allows (and
  sometimes encourages), namely run-time environments, evaluation
  environments, and compilation environments.
\item During bootstrapping of \sysname{} using a \emph{host}
  \commonlisp{} system, \sysname{} objects and mappings can be
  isolated from host objects and mappings by the use of first-class
  environments.
\end{itemize}

First-class environments also allow for other features such as
\emph{sandboxing} that are not directly used by \sysname{}.

This module has been extracted to a separate repository by the name of
\clostrum{}%
\footnote{https://github.com/s-expressionists/Clostrum} and it
provides several important features:

\begin{itemize}
\item The run-time environment is operated upon by a number of generic
  functions that mimic standard \commonlisp{} functions such as
  \texttt{fdefinition}, \texttt{macro-function}, etc., except that
  these functions have an additional parameter named \emph{client}.
  Client code can provide primary or auxiliary methods that specialize
  to some client-provided class, so as to override or extend the
  behavior of the default methods provided by \clostrum{}.
\item The evaluation environment is similar to the run-time
  environment, but also provides trampoline methods that calls the
  same generic function in the run-time environment whenever some
  object is not found in the evaluation environment.
\item The compilation environment is different, and can be used to
  hold imple\-mentation-specific information provided by the compiler
  during the compilation process, and queried by other parts of the
  compiler so as to provide feedback to the user in the form of errors
  or warnings.
\end{itemize}

Furthermore, \clostrum{} environments can be configured in multiple
\emph{layers}, so as to allow for \emph{incremental deltas}.  This
feature could be used in a text editor for parsing the contents of a
text buffer containing \commonlisp{} code.  When the buffer contents
is modified, side effects on the global environment by the old
contents can be undone by simply abandoning the upper layers while
preserving the environment contents associated with the code preceding
the modification.

\section{Lexical compile-time environments}
\label{sec-lexical-compile-time-environments}

A \commonlisp{} compiler must maintain a compile-time environment that
evolves during the compilation process.  Nested constructs such as
\texttt{let} and \texttt{labels} create new entries that are valid as
long as the construct is in scope. The module named \trucler{}%
\footnote{https://github.com/s-expressionists/Trucler} provides this
functionality.

\trucler{} provides a \clos{} \emph{protocol} for querying and
augmenting lexical environments.  The functionality provide has the
same purpose as the functions documented in section 8.5 of the second
edition of ``Common Lisp the language'' by Guy Steele
\cite{Steele:1990:CLL:95411}.  However the functionality documented
there is insufficient (it does not contain information introduced by
\texttt{block} and \texttt{tagbody} special operators, for instance),
and not extensible (because it relies on a fixed number of return
values).  In contrast, the protocol provided by \trucler{} allows
client code to customize the behavior.  For that purpose, \trucler{}
methods have a \emph{client} parameter that is provided by client
code.  Default \trucler{} methods do not specialize to this parameter,
but client code can provide primary or auxiliary methods that do
specialize to that parameter, making these methods applicable only for
that client.

\trucler{} provides a \emph{reference} implementation that can be used
by new \commonlisp{} implementations.  But \trucler{} has the
possibility of using client-specific environment with its protocol.
This feature allows the creation of compilers that can be used in
different \commonlisp{} implementations without the need to adapt the
environment-manipulation code to each implementation.  Furthermore,
\trucler{} provides two such \emph{native} implementations, namely for
\sbcl{} and \ccl{}.  We welcome contributions to \trucler{} in the
form of more such native implementations.

\section{S-expression syntax}
\label{sec-s-expression-syntax}

This library is technically not part of the \sysname{} project.  It
was written by Jan Moringen independently.  It contains s-expression
level parsers for all standard special forms and all standard macros.
It uses the architecture.builder-protocol to communicate with a module
that creates the result of the parse.

We use it together with the Iconoclast library, described in
\refSec{sec-ast} which creates abstract syntax trees represented as
a graph of instances of standard classes.

The library can be found here:
https://github.com/scymtym/s-expression-syntax 

\section{Abstract syntax trees}
\label{sec-ast}

The Iconoclast library defines a large number of classes used to
represent abstract syntax trees, or ASTs.  We use this library as the
first compilation phase in that the s-expression-syntax library
described in \refSec{sec-s-expression-syntax} is invoked to parse
the source code and Iconoclast creates a graph of AST instances that
represents all aspects of the source code, including source positions
for each expression.

The library can be found here: https://github.com/robert-strandh/Iconoclast

\section{High-level control-flow graph}

We created a library%
\footnote{https://github.com/robert-strandh/Hirundine}
for representing code as a control-flow graph.  The objects
manipulated by the library are \commonlisp{} objects, and the
instructions do not work on any aspects of the representation of the
objects.  They just make explicit the control flow that is implicit in
most operators in the \commonlisp{} standard.
